-
Notifications
You must be signed in to change notification settings - Fork 0
Limit user resources on *geniux* #126
base: master
Are you sure you want to change the base?
Conversation
By ignorance and inattention, users run calculations on our gateway server *geniux*, affecting all other users. Prevent that technically, by limitting the resources to one CPU and ten percent of the memory. See systemd.resource-control(5) for more details. The current resource limits for user id 133 can be checked With `systemd-cgls` and `systemctl status user-133.slice`. Users can still cripple the system with high IO and network load.
60ac109
to
feef63b
Compare
I''ll object to any form of resource limiting by a "percentage" value. This is wrong. |
Care to elaborate? Anyway, please suggest absolute values then. |
Good idea. We should try. |
You already suggested values by setting it to 10%. 10% of "tested on X" or 10% of "tested on Y"... I don't really know what you would like to set it to. you had 3 GB on stitch and 6 GB on geniux. So which one would it be ? if you tested it on nomnomnom it would be 200 GB... That is my objection to percentage values. they age like milk. The solution would be to reserve memory and CPU to a certain user, not restricting memory & CPU to all others. Then this user ( root ) should be able to fix the system. manually or with magic. you still end up in a unresponsive system when several people go to limits. |
Do not apply the user resource limits to user *root*.
Absolute values are preferred by some, so arbitrarily choose 3 GB. (Before it would have been around 6 GB on *geniux*, which seems excessive.)
It depends on what your goals are. Limiting resources is like schedulers, and are only a heuristic. Relative values stay also current.
How should that work?
Good point. I try to exclude that user.
The proposed solution will hopefully solve the majority of the issue we were seeing with our gateway server. |
@wwwutz: 3 GB/user limit okay with you for a test run? Last time we had a berserk user process on geniux, I needed over 15 minutes to log in to geniux via bka, indentify the process and kill it. During that time nobody could work from home. So I think, this is a problem we need to address using the options we have. |
We found out, that geniux still has a swap file, which might explain the laaaaaag when running out of physical memory. Without a swap file and a restrictive overcommit policy, the problematic user jobs probably would have died right away. So as an alternative we might disable the swap on geniux. |
This. Thanks, Grandpa |
Is the claim regarding swap still valid after reading the article In defence of swap: common misconceptions, you shared on June 30th? At least in June, there were still some out of memory situations.
|
Claim 3 stimmt nicht, wenn ich keine swap habe, verhindere ich I/O vom swapper. Claim 5 ist so ein typisches "wir haben doch eh alle SSDs." argument. Haben wir nicht, wir haben langsame platten. Claim 6 bezieht sich auf dem OOM-Killer, der meiner Meinung nach nichts im System zu suchen hat solange man ihm nicht beibringt, welche prozesse er nicht anfassen darf. |
Die Argumentation da ist, dass der I/O trotzdem stattfindet. Da ohne Swap anonyme Seiten (ohne File-Backend) nicht freigemacht werden können, würden statt dessen eben Seiten mit File-Backend rausgeworfen, wenn freie Seiten gebraucht werde. Der I/O geht dann nicht über den Swap sondern über die Filesysteme, wäre also trotzdem vorhanden. Der I/O wäre höher, denn dadurch, dass mir die inaktiven anonymen, dirty Seiten nicht zum freimachen zur Verfügung stehen, müssten aktive Seiten freigemacht werden, die dann wieder reingefaultet werden müssen. |
Maybe it got lost in the long discussion, but I claim limiting the resources is still useful and needed, as disabling swap did not help. |
Was the system in a unusable state after the swap has been disabled? |
Yes, the OOM became active. (I just saw it in the logs, but having processes killed belongs to the unusable category to me.) |
"the OOM became active" und "belongs to the unusable category to me". darum gings hier nicht. Es ging darum ein system nicht mehr warten zu koennen, weil man sich nicht einloggen kann, weil es unresponsive ( wegen hohem I/O) wurde. Das hat nichts mit einem OOM Killer zu tun. Der ist nur laestig, aber der macht das system nicht unresponsive. Und wenn er das macht, macht der was falsch. Dann muss der weg. veto. |
This merge/pull request is for limiting the memory resources for processes, because it was making geniux unusable. The OOM is often too late (and that is why oomd was written as an alternative for example). No idea, what it has to do with I/O. What am I missing? I have the feeling we talk past each other. |
OK, dann eben so: Nein, ich halte es nicht fuer sinnvoll auf einem Rechner die Speicherresourcen einzelner User oder Prozesse zu begrenzen. Warum auch immer man das tun solle. Nein, ich moechte das nicht. Nein. Nein. Nein. Ich befuerchte, dass wir durch diese Sonderkonfiguration eines einzelnen Servers uns mehr Probleme einhandeln als wir lösen. Also: veto gegen diesen pull-request. |
And another occurrence.
I am missing alternative proposal to fix the annoying problem. |
91ee9fc
to
647c337
Compare
Tested on stitch, and geniux.